16 Bayesian Inference

?

Draw the region like in Lecture 16, we have P(U1+U21)=12.
Draw a 3-d region, we have P(U1+U2+U31)=16.
Consider the following polytope in Rn:Δn={(u1,,un)Rn|0ui1,u1++un1}.Then Vol(Δn)=1n!. (We can use multi-dimensional integration & induction to show it.) So P(U1++Un1)=1n!.
Define Ek as the event of U1++Uk1. Naturally EnEn1 . ThenP(N=n,n2)=P(En1Enc)=P(En1)P(En1En)=1(n1)!1n!=n1n!.
ThusE[N]=n=2nP(N=n)=n=21(n2)!=e.


Can also see this note from Statistics Theory on Bayes estimation.

1 Bayesian Inference

X: observed data. Θ: unknown parameter(s). All continuous random variables.
Law of Total Probability:fX(x)=+fx|Θ=θ(x)fΘ(θ)dθ.
Bayes Rule for Continuous RV:fΘ|X=x(θ)=fX|Θ=θ(x)fΘ(θ)+fX|Θ=θ(x)fΘ(θ)dθ.
If X or Θ is discrete, use p.m.f instead of p.d.f.

Example (Binomial)

(X|Θ=θ)Binomial(n,θ). ΘBeta(α,β), (Θ|X=x)Beta(α+x,β+nx). ThenP(X=x|Θ=θ)=(nx)θx(1θ)nx1{x{0,1,,n}}.
ThenfΘ|X=x(θ)P(X=x|Θ=θ)fΘ(θ).If fΘ(θ)θα1(1θ)β1, thenfΘ|X=x(θ)θα+x1(1θ)β+nx1.

For d dimensions, X=(X1,,Xd),θ=(θ1,,θd), i=1dθi=1. (X|Θ=θ)Multinomial(n,θ1,,θd).P(X=x|Θ=θ)=(nx1,,xd)θ1x1θdxd1{i=1nxi=n}i=1d{xi{0,,n}}.

Example (Gaussian)

X1,,Xni.i.dN(μ,σ2).

  1. μ unknown, σ2 known.
    X=(X1,,Xn). Random variable M=μ.fX|M=μ(x)exp{12σ2i=1n(xiμ)2}.
    Conjugate prior MN(μ0,σ02).fM(μ)exp{12σ02(μμ02)}.
fM|X=x(μ)fX|M=μ(x)fM(μ)exp{12σ2i=1n(xiμ)212σ02(μμ0)2}exp{12σ02(μμn)2},

whereμn=(σ2nσ02+σ2)μ0+(nσ02nσ02+σ2)μML1ni=1nXi,1σn2=1σ02+nσ2.
(Precision = prior precision + data precision)

  1. μnμML as n.
  2. Precisions are additive.
  3. Precision gets large as sample size gets large.
  4. For a finite n, if σ02, then μnμML and σn2σ2n.

  1. μ known, σ2 unknown.
    Put a prior on precision Λ=1σ2. ThenfX|Λ=λ(x)=(λ2π)n2exp{λ2i=1n(xiμ)2}.Conjugate prior ΛGamma(α0,β0):fΛ(λ)=β0α0Γ(α0)λα01eβ0λ.fΛ|X=x(λ)fX|Λ=λ(x)fΛ(λ)λα0+n21exp{λ[β0+12i=1n(xiμ)2]}, σML2=1ni=1n(xiμ)2. Then (Λ|X=x)Gamma(αn,βn). αn=α0+n2,βn=β0+n2σML2.

  1. Both μ,σ2 unknown.fX|M=μ,Λ=λ(x)[λ12eλμ22]nexp{λμi=1nxiλ2i=1nxi2}. fM,Λ(μ,λ)=fM|Λ=λ(μ)N(μ0,1cλ)fΛ(λ)Gamma(α,β), whereμ0=ac,α=1+c2,β=ba22c,a,b,c>0.

2 Model Selection

Double Exponential/Laplace

XLaplace(μ,β),β>0.fX(x)=12βexp(|xμ|β). E[X]=μ,Var(X)=2β2.

ΘBernoulli(12). Consider Model 0 vs. Model 1.
Prior odds: P(Θ=0)P(Θ=1).
X1,,Xn|Θ=0i.i.dN(0,1), p.d.f. f0(x)=12πex22.
X1,,Xn|Θ=1i.i.dLaplace(0,π2), p.d.f. f1(x)=12πe|x|2π.
Bayes Factor: fX|Θ=0(x)fX|Θ=1(x).
Posterior odds: BF×Prior odds.

Now suppose

X1,,Xn|Θ=0i.i.dN(0,α2),α>0 unknown.X1,,Xn|Θ=1i.i.dLaplace(0,β),β>0 unknown.

Put priors on α,β and get RV: A,B.

For example, logA|Θ=0Uniform(c,c),logB|Θ=1Uniform(c,c).

fX|Θ=0(x)=0+fX|Θ=0,A=α(x)fA|Θ=0(α)dα.

Z=logA,A=T(Z)=eZ. fA|Θ=0(α)=flogA|Θ=0(logα)|ddαlogα|=1αflogA|Θ=0(logα)=1{c<logα<c}2cα,fX|Θ=1(x)=0fX|Θ=1,B=β(x)fB|Θ=1(β)dβ.fB|Θ=1(β)=1{c<logβ<c}2cβ,BF=fX|Θ=0(x)fX|Θ=1(x).
(c cancels out in the Bayes Factor.)